AITopics | stable audio 2

Collaborating Authors

stable audio 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation

Zhang, Chong, Ma, Yukun, Chen, Qian, Wang, Wen, Zhao, Shengkui, Pan, Zexu, Wang, Hao, Ni, Chongjia, Nguyen, Trung Hieu, Zhou, Kun, Jiang, Yidi, Tan, Chaohong, Gao, Zhifu, Du, Zhihao, Ma, Bin

arXiv.org Artificial IntelligenceFeb-28-2025

We introduce InspireMusic, a framework integrated super resolution and large language model for high-fidelity long-form music generation. A unified framework generates high-fidelity music, songs, and audio, which incorporates an autoregressive transformer with a super-resolution flow-matching model. This framework enables the controllable generation of high-fidelity long-form music at a higher sampling rate from both text and audio prompts. Our model differs from previous approaches, as we utilize an audio tokenizer with one codebook that contains richer semantic information, thereby reducing training costs and enhancing efficiency. This combination enables us to achieve high-quality audio generation with long-form coherence of up to $8$ minutes. Then, an autoregressive transformer model based on Qwen 2.5 predicts audio tokens. Next, we employ a super-resolution flow-matching model to generate high-sampling rate audio with fine-grained details learned from an acoustic codec model. Comprehensive experiments show that the InspireMusic-1.5B-Long model has a comparable performance to recent top-tier open-source systems, including MusicGen and Stable Audio 2.0, on subjective and objective evaluations. The code and pre-trained models are released at https://github.com/FunAudioLLM/InspireMusic.

inspiremusic-0, inspiremusic-1, stable audio 2, (12 more...)

arXiv.org Artificial Intelligence

2503.00084

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Stable Audio Open

Evans, Zach, Parker, Julian D., Carr, CJ, Zukowski, Zack, Taylor, Josiah, Pons, Jordi

arXiv.org Artificial IntelligenceJul-31-2024

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

arxiv, stable audio 2, training data, (13 more...)

arXiv.org Artificial Intelligence

2407.14358

Genre: Research Report (1.00)

Industry: Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

Stability AI's audio generator can now crank out 3 minute 'songs'

EngadgetApr-3-2024, 16:06:20 GMT

Stability AI just unveiled Stable Audio 2.0, an upgraded version of its music-generation platform. This system lets users create up to three minutes of audio via text prompt. Just imagine the fake birthday song you could make in the style of that one Rob Thomas/Santana track. The tool is free and publicly available through the company's website, so have at it. Introducing Stable Audio 2.0 – a new model capable of producing high-quality, full tracks with coherent musical structure up to three minutes long at 44.1 kHz stereo from a single prompt.

audio generator, stability ai, stable audio 2, (6 more...)

Engadget

Industry:

Media > Music (0.53)
Leisure & Entertainment (0.53)

Technology: Information Technology > Artificial Intelligence (0.35)

Add feedback